Bayesian hidden Markov model analysis of single-molecule force spectroscopy: 
Characterizing kinetics under measurement uncertainty 
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Single-molecule force spectroscopy has proven to be a powerful tool for studying the kinetic be- 
havior of biomolecules. Through application of an external force, conformational states with small 
or transient populations can be stabilized, allowing them to be characterized and the statistics of in- 
dividual trajectories studied to provide insight into biomolecular folding and function. Because the 
observed quantity (force or extension) is not necessarily an ideal reaction coordinate, individual ob- 
servations cannot be uniquely associated with kinetically distinct conformations. While maximum- 
likelihood schemes such as hidden Markov models have solved this problem for other classes of 
single-molecule experiments by using temporal information to aid in the inference of a sequence of 
distinct conformational states, these methods do not give a clear picture of how precisely the model 
parameters are determined by the data due to instrument noise and finite-sample statistics, both sig- 
nificant problems in force spectroscopy. We solve this problem through a Bayesian extension that 
allows the experimental uncertainties to be directly quantified, and build in detailed balance to fur- 
ther reduce uncertainty through physical constraints. We illustrate the utility of this approach in 
characterizing the three-state kinetic behavior of an RNA hairpin in a stationary optical trap. 



I. INTRODUCTION 

Recent advances in biophysical measurement have led to 
an unprecedented ability to monitor the dynamics of sin- 
gle biological macromolecules, such as proteins and nucleic 
acids [1]. As a new approach to probing the behavior of bio- 
logical macromolecules, these experiments promise to change 
the way we study folding, dynamics, catalysis, association, 
transcription, translation, and motility, providing otherwise- 
inaccessible information about microscopic kinetics, energet- 
ics, mechanism, and the stochastic heterogeneity inherent in 
these processes. Advances in instrumentation for optical force 
spectroscopy in particular have produced instruments of ex- 
traordinary stability, precision, and temporal resolution [2, 3] 
that have already demonstrated great utility in the study 
of biomolecules in the presence of externally perturbative 
forces [4-6]. Under external force, it becomes possible to sta- 
bilize and characterize short-lived conformational states, such 
as protein folding and unfolding intermediates [7-9]. 

In a typical single-molecule optical trapping experiment, a 
protein or nucleic acid is tethered to two polystyrene beads 
by dsDNA handles that prevent the molecule under study 
from interacting with the beads (see Figure 1). The handle- 
biomolecule-handle assembly — referred to as a fiber — is asso- 
ciated with the beads through tight noncovalent interactions, 
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with one bead held in an optical trap and the other either suc- 
tioned to a micropipette or held in a second optical trap. Dur- 
ing an experiment, the position of the bead within the laser 
trap is monitored, and either the relative displacement from 
the trap center or the total force on the bead is recorded, re- 
sulting in a timeseries such as the one depicted in Figure 2. 
The instrument can generally be operated in several modes: a 
force ramp mode, in which the trap is translated rapidly enough 
to potentially carry the system out of equilibrium; an equilib- 
rium passive mode, in which the trap is held fixed; and a con- 
stant force-feedback mode, in which the trap is continually repo- 
sitioned to maintain a set constant force on the fiber. Here, we 
concern ourselves with the latter two classes of experiment, 
though nonequilibrium experiments remain an exciting topic 
of active research [10]. 

Often, the dynamics observed in these experiments ap- 
pears to be dominated by stochastic transitions between two 
or more strongly metastable conformational states [11, 12] — 
regions of conformation space in which the system remains 
for long times before making a transition to another confor- 
mational state. These transitions are generally well-described 
by first-order kinetics [13]. While visual inspection of the dy- 
namics may suggest the clear presence of multiple metastable 
states, quantitative characterization of these states is often dif- 
ficult. First, the observed force or extension is unlikely to cor- 
respond to a true reaction coordinate easily able to separate 
all metastable states [14-17], and second, measurement noise 
may further broaden the force or extension signatures of indi- 
vidual states, increasing their overlap. Attempting to separate 
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FIG. 1. Single-molecule optical trapping configuration. The 

biomolecule of interest — here, the p5ab RNA hairpin — is teth- 
ered to two polystyrene beads by dsDNA handles. The fluc- 
tuating force on one bead held in an optical trap is monitored, 
while the other bead is held suctioned to a micropipette tip. 
Conformational transitions of the hairpin — such as transitions 
among the three kinetically metastable states illustrated here — 
are observed indirectly through motion of the bead in the trap. 



these states by simply dividing the observed force or extension 
range into regions, following current practice [18, 19], can of- 
ten lead to a high degree of state mis-assignment that results in 
the estimated rate constants and state distributions containing 
a significant amount of error [20] (see Supplementary Material: 
Comparison with threshold model). 

Hidden Markov models (HMMs) [21], which use tempo- 
ral information in addition to the instantaneous value of 
the observable (force or extension) to determine which con- 
formational states the system has visited during the exper- 
iment, have provided an effective solution to the hidden 
state problem in many other classes of single-molecule exper- 
iments, such as ion channel currents [22-25], single-molecule 
FRET [26-30], and the stepping of motor proteins [31-33]. In 
applying hidden Markov modeling to the analysis of single- 
molecule force spectroscopy data, the observed force or ex- 
tension trace is assumed to come from a realization of an 
underlying Markov chain, where the system makes history- 
independent transitions among a set of discrete conforma- 
tional states with probabilities governed by a transition or rate 
matrix. Data, in the form of force or bead-to-bead extension 
measurements, is sampled at an interval that ensures that se- 
quential observations satisfy the Markov property of history- 
independence, though the appropriate interval depends on 
the properties of the experimental configuration. Under a 
given set of external force conditions, each state has a distri- 
bution of forces or extensions associated with it. Given ob- 
served timeseries data for forces or extensions, the maximum 
likelihood estimate (MLE) of the model parameters (transition 
rates and state force or extension distributions) and sequence 
of hidden states corresponding to the observed data can be 
determined by standard methods [34, 35], as demonstrated in 
recent work [36]. 

Unfortunately, this approach has a number of significant 
drawbacks. Due to technical limitations, experiments often 
suffer from limited statistics — the events of interest (transi- 



tions between states or visits to rare states) may occur only 
a few times during the course of the measurement, and data 
for additional fibers is time-consuming to collect. As a result, 
while the MLE yields the most likely set of model parame- 
ters, there may be enormous uncertainty in some of these pa- 
rameters, and the uncertainty in multiple parameters may be 
correlated in complex nonlinear ways. While methods exist 
for estimating the standard error or confidence intervals from 
MLHMMs [37], these schemes can be prohibitively costly for 
long traces, and may still significantly underestimate the sta- 
tistical error for short traces due to the normally-distributed 
error approximation inherent in the approach. The high cost 
(both in terms of instrument and experimenter time) of collect- 
ing additional data also means that it is not a simple task to 
judge how much data need be collected to test a particular hy- 
pothesis in a statistically meaningful way. Worse yet, the stan- 
dard algorithms employed to find the MLE may not even find 
the true maximum likelihood solution, instead converging to 
a local maximum in likelihood that is far from optimal [38]. 

Here, we resolve this issue through the use of a Bayesian 
extension of hidden Markov models [39-42] applicable to sin- 
gle molecule force experiments. By sampling over the pos- 
terior distribution of model parameters and hidden state as- 
signments instead of simply finding the most likely values, 
the experimenter is able to accurately characterize the corre- 
lated uncertainties in both the model parameters (transition 
rates and state force or extension distributions) and hidden 
state sequences corresponding to observed data. Addition- 
ally, prior information (either from additional independent 
measurements or physical constraints) can be easily incorpo- 
rated. We also include a reversibility constraint on the tran- 
sition matrix — in which microscopic detailed balance is im- 
posed on the kinetics, as dictated by the physics of equilibrium 
systems [43] — which has been shown to significantly reduce 
statistical uncertainties in data-poor conditions [44, 45]. The 
framework we present is based on Gibbs sampling [46, 47], al- 
lowing simple swap-in replacement of models for observable 
distributions, extension to multiple observables, and alterna- 
tive models for state transitions. Additionally, the Bayesian 
method provides a straightforward way to model the statisti- 
cal outcome and assess the utility of additional experiments 
given some preliminary data, allowing the experimenter a 
powerful tool for assessing whether the cost of collecting ad- 
ditional data is outweighed by their benefits. A Matlab im- 
plementation of this approach is available online [http: // 
simtk . org/home/bhmm]. 



II. HIDDEN MARKOV MODELS FOR FORCE 
SPECTROSCOPY 

Suppose the temporal behavior of some observable 0(x) 
that is a function of molecular configuration x — here, gener- 
ally force or molecular extension — is observed at temporal in- 
tervals At to produce a timeseries o t/ where t = 0, 1, . . . , L. 
An instantaneous observation o t does not necessarily contain 
enough information to unambiguously identify the current 
conformational state the molecule occupies; to infer the hid- 
den state, we must also make use of the temporal information 
in the observed trace. We restrict ourselves to consideration of 
scalar functions 0(x), but the generalization to multidimen- 
sional probes (or multiple probes, such as combined force and 
fluorescence measurements [48]) and multiple observed tern- 
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poral traces is straightforward. 

We presume the system under study has M kinetically dis- 
tinct states, in the sense that the system generally remains in 
a given state for several observation intervals At, but these 
states may not necessarily represent highly populated states 
of the system at equilibrium. We treat these conformational 
states as the hidden states of the model, because we cannot di- 
rectly observe the identity of the metastable state in which the 
system resides. The hidden Markov model presumes the ob- 
served data O = {o t } was generated according to the follow- 
ing model dependent on parameters = {T, E}, where T 
is an M x M row-stochastic transition matrix and E a set of 
emission parameters governing the observable (force or exten- 
sion) distributions for each of the M hidden states, and prior 
information about the initial state distribution p, 

n*o) = p SQ 

¥(s t \s t _ 1 ,T) = T St _ iSt , t>l 
F(o t \s u e St ) = <p(o t \e St ). (1) 

In diagrammatic form, the observed state data {o t } and corre- 
sponding hidden state history {s t } can be represented 

P T T T T 

— > So — > Si — > s 2 — > • • • — > s L 

i(p i(p i(p lip V) 

Oq 1 2 O l 

The initial state distribution p reflects our knowledge of the 
initial conditions of the experiment that collected data o. In the 
case that the experiment was prepared in equilibrium, p cor- 
responds to the equilibrium distribution tt of the model tran- 
sition matrix T; if the experiment was prepared out of equilib- 
rium, p may be chosen to reflect some other prior distribution 
(e.g. the uniform prior). 

State transitions (s t _ 1 — >> s t ) are governed by the discrete 
transition probability T 3 3t . The Markov property of HMMs 
prescribes that the probability that a system originally in state 
% at time t is later found in state j at time t+ 1 is dependent only 
on knowledge of the state i, and given by the corresponding 
matrix element Tij of the (row-stochastic) transition matrix T. 
Alternatively, one could instead use the rate matrix K, related 
to the transition matrix T through the equation T = e KA< . 
If the processes described by T or K are slow compared to 
the observation interval At, then we can easily estimate the 
rate matrix from the associated transition matrix in a way that 
avoids the matrix logarithm, through the expansion K « (T — 
I) / At, where I denotes the M x M identity matrix. 

The probabilistic "emission" of observables from each state 
(s t — ► o t ) is governed by the continuous emission probability 
(p(o t | e s J, parametrized by observable emission parameters e. 
For example, in the force spectroscopy applications described 
here, ip(o | e s ) is taken to be a univariate normal (Gaussian) 
distribution parameterized by a mean \i and variance a 2 that 
characterize each state, such that = {/Xi, a 2 }. Other choices 
of observable distribution can easily be substituted in a mod- 
ular way without affecting the structure of the algorithms pre- 
sented here. 

Given the HMM process specified in Eq. 1, the probability 
of observing data O given the model parameters is then, 

L 

P(0 I 0) = J2ps v(o I e So )rj T s t -i^( * I e. t ), (3) 

S t=l 



where the sum over hidden state histories S is shorthand for 

MM M 

£=££-£• w 

S s = l s^l s L —l 

If multiple independent traces {o t } are available, the probabil- 
ity P(0 | 0) is simply the product of Eq. 3 for the independent 
traces. 



A. Maximum likelihood hidden Markov model 
(MLHMM) 

The standard approach to construct an HMM from ob- 
served data is to compute the maximum likelihood estimator 
(MLE) for the model parameters = {T, E}, which maxi- 
mize the probability of the observed data O given the model, 

argmaxP(0 | 0), (5) 

yielding MLE estimates of transition matrix T and state emis- 
sion parameters E. Typically, determination of the model 
parameters is carried out using the Baum-Welch algo- 
rithm [34]. 

Once the MLE parameters are determined, the most 
likely hidden state history that produced the observations O 
can be determined using these parameters: 

S = argmaxP(S | O,0). (6) 

This is typically carried out using the Viterbi algorithm [35], a 
classic example of dynamic programming. 



B. Bayesian hidden Markov model (BHMM) 

Instead of simply determining the model that maximizes 
the likelihood of observing the data O given the model param- 
eters 0, we can make use of Bayes' theorem to compute the 
posterior distribution of model parameters given the observed 
data: 

P(0 | O) oc P(0 | 0)P(0). (7) 

Here, P(0) denotes a prior distribution that encodes any a pri- 
ori information we may have about the model parameters 0. 
This prior information might include, for example, physical 
constraints (such as ensuring the transition matrix satisfies de- 
tailed balance) or prior rounds of inference from other inde- 
pendent experiments. 

Making use of the likelihood (Eq. 3), the model posterior is 
then given by, 

L 

P{& | O) cx P(0)^>^(o o | e ao )rj T ^-i-^(°t I e * t X 8 ) 

S t = l 

Drawing samples of from this distribution will, in princi- 
ple, allow the confidence with which individual parameters and 
combinations thereof are known, given the data (and subject 
to the validity of the model of Eq. 1 in correctly representing 
the process by which the observed data is generated). While 
the posterior P(0|O) is complex, we could in principle use 
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a Markov chain Monte Carlo (MCMC) approach [47] to sam- 
ple it. In its current form, however, this would be extremely 
expensive due to the sum over all hidden state histories S 
appearing in ratios involving Eq. 8. Instead, we introduce 
the hidden state histories S as an auxiliary variable, sampling 
from the augmented posterior, 



P(0,S | O) 



p(0) 



(9) 

which makes it much less costly to compute the ratios required 
for MCMC on the augmented (0, S) parameter space. 

If we presume the prior is separable, such that P(0) = 
P(T)P(E), we can sample from the augmented posterior 
(Eq. 9) using the framework of Gibbs sampling [47], in which the 
augmented model parameters are updated by sampling from 
the conditional distributions, 

L 

P(S I T, E, O) oc p flo p(o | e s j[]T Vi ^(o, | e s J 



P(T | E, S, O) = P(T | S) oc P(T) \[ T St _ iSt 

t=i 

L 

P(E | S, T, O) = P(E | S, O) oc P(E) FJ V {o t | e. t ). (10) 

t = 

The equalities on the second and third lines reflect the con- 
ditional independence of the hidden Markov model defined 
by Eq. 1. When only the model parameters = {T,E} 
or the hidden state histories S are of interest, we can sim- 
ply marginalize out the uninteresting variables by sampling 
from the augmented joint posterior for {T, E, S} and examine 
only the variables of interest. In addition, the structure of the 
Gibbs sampling scheme above allows individual components 
(such as the observable distribution model cp(o | e) or transi- 
tion probability matrix T) to be modified without affecting the 
structure of the remainder of the calculation. 

In the illustrations presented here, we employ a Gaussian 
observable distribution model for cp(o | e), 



<f(o I 



<f(o I ^o- 



27TCT 



exp 



1 (o - M) 2 

2 a 2 



, (11) 



where \i is the mean force or extension characterizing a partic- 
ular state, and a is the standard deviation or width of forces or 
extensions corresponding to that state. We note that marginal 
posterior distributions of each mean P(m\0) reflect the sta- 
tistical uncertainty in how well the mean force or position is 
determined, and need not correspond to the standard devia- 
tion <ji, which may be much broader (or narrower, depending 
on the situation). 



1. Observable parameter estimation 

We first initialize the observed distributions of each state by 
fitting a Gaussian mixture model with M states to the pooled 
observed data O, ignoring temporal information: 

L M 

P(0 | 7T, E) = FJ 7T m ^(o t | flm, (&), (12) 

t=0 m=l 

where the state observable emission probability vector E = 
{ei, . . . , e M } and e m = {/i m , a m } with /i m denoting the ob- 
servable mean and a m the variance for state m for the Gaus- 
sian mixture model. The vector iz is composed of equilibrium 
state populations {m, . . . , ttm} with 7r m > and X^m=i 71 m = 
1. 

A first approximation to 7r and E is computed by pooling 
and sorting the observed o t , and defining M indicator func- 
tions hm (o) that separate the data into M contiguous regions 
of the observed range of o of roughly equal population. Let 
N m = J2t=o hrn(o t ) denote the total number of observations 
falling in region m, and iV to t = Sm=i ^ rn - ^ ne i R i na l param- 
eters are then computed as, 

7T m = Nm/N tot 
L 

flm = N' 1 ^ O t h m (o t ) (13) 



t=0 
L 



CT m ^2(o t - fl m ) 2 hm{p t ). 



(14) 



This approximation is then improved upon by iterating the 
expectation-maximization procedure described by Bilmes [49], 

L 

TTm ^tot XI Xm(o t , E, 7r) 
L 

L 

cr ,2 m = (7r m A^tot) _1 ^2(o t - n' m ) 2 Xm(o t , E, 7T) (15) 

where the function x™ (o, E, 7r) is given by the fuzzy member- 
ship function, 



Xm(o, E,7T) 



7r m cp(o | e m ) 

M 

ni (p(o | ei) 



(16) 



The iterative procedure is terminated at iteration j when the 
change in the parameters {71-, cr 2 } falls below a certain rela- 
tive threshold, such as ||7r^ ] - 7r [j ~ 1] h/W^ || 2 < 10~ 4 . 



III. ALGORITHMS 

A. Generating an initial model 

To initialize either computation of the MLHMM or sam- 
pling from the posterior for the BHMM, an initial model that 
respects any constraints imposed in the model prior P(0) 
must be selected. Here, we employ a Gaussian observable 
distribution model for cp(o \ e) (Eq. 11) and enforce that the 
transition matrix T satisfy detailed balance. 



2. Transition matrix estimation 

Once initial state observable emission parameters E are de- 
termined, an initial transition matrix is estimated using an iter- 
ative likelihood maximization approach that enforces detailed 
balance [50]. First, a matrix of fractional transition counts 
C = (cij) is estimated using the membership function: 

L 

Ci J = ^Xi(ot-i,E,7r)xj(ot,E,7r) (17) 

t=i 
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A symmetric M x M matrix X = (xij) is initialized by 

Xij — Xji — Cij -\~ Cji. (1^) 

The iterative procedure described in Algorithm 1 of [50] is then 
applied. For each update iteration, we first update the diago- 
nal elements of X: 



Ci* C%\ 



— ^ ^ Cjj i X%* — ^ ^ %ij ; (1^) 



3 = 1 3=1 

followed by the off-diagonal elements: 

-b+Vb 2 -4ac 



(20) 



13 ~ 31 2a 
where the quantities a, b, and c are computed from X and C, 

& = Ci* Cij ~\~ Cj* Cji 

b = Ci*(^Xj* Xji} ~\~ Cj*(^Xi* ^"ij ') 

(Cij Cji^i^Xi* H - ^ji) 

C = (.Cij H - Cji)(Xi* Xij^){x j* Xji}. (^-0 

Once a sufficient number of iterations j have been completed 
to compute a stable estimate of X (such as the relative conver- 
gence criteria ||X [j] -X [J - 1] || 2 /||X [J ' ] ||2 < 10" 4 , the maximum 
likelihood transition matrix estimate T is computed as 



Xij I Xi* 



(22) 



Note that the equilibrium probability vector 7r computed dur- 
ing the Gaussian mixture model fitting is not respected during 
this step. 



B. Fitting a maximum likelihood HMM 

The HMM model parameters = {T, E} are fit to the ob- 
served data O through use of the expectation-maximization 
(EM) algorithm [51]. This is an iterative procedure, where the 
model parameters are subsequently refined through succes- 
sive iterations. The initial HMM is usually quick to compute, 
and can give the experimenter a rough idea of the model pa- 
rameters, as well as providing a useful starting point for sam- 
pling models from the Bayesian posterior. 

During each iteration, the Baum-Welch algorithm [34] is 
used to compute S = (£ t ij)/ which represents the probabil- 
ity that the system transitions from hidden state i at time t — 1 
to hidden state j at time t, and j ti , the probability that the sys- 
tem occupied state i at time t. This is accomplished by first 
executing the forward algorithm, 

n _ j Pj ^(°o | ej) t = 

a tj - < ( , , ^ M \2.5) 
\ip\Qt | ej) 2^. =1 a^^iTij t = l,...,L 

followed by the backward algorithm, 

PU = {e^! TiM<H+i I e^^t+iW t = (L — 1), . . . , (24) 

The LxMxM matrix S is then computed for t = 0, . . . , (L— 1) 
as, 

M 

Ztij = otti(p(o t +i I ei)TijP( t +i)j/ ^ctn (25) 



M 

Iti ^2 

j=i 



(26) 



In practice, the logarithms of these quantities are computed 
instead to avoid numerical underflow. 

The aggregate matrix of expected transition counts 
C = (cij) is then computed from S as, 



tij • 



(27) 



This count matrix is used to update the maximum-likelihood 
transition matrix T using the method of Prinz et al. [50] de- 
scribed in the previous section. 

The state observable distribution parameters E are then up- 
dated from the ju- For the univariate normal distribution ap- 
plied to force spectroscopy data here, we update the mean fii 
and variance of for state i using the scheme, 

L L 

E °tiu E(°* - v'ifiti 

l t=0 /2 t=0 /00 x 

Pi —j ; <* i = . (28) 



L 

E iu 

t=0 



L 
t=0 



Once the model parameters have been fitted by iteration of 
the above update procedure to convergence (which may only 
converge to a local maximum of the likelihood), the most likely 
hidden state sequence can be determined given the observa- 
tions O and the MLE model using the Viterbi algorithm [35]. 
Like the forward-backward algorithm employed in the Baum- 
Welch procedure, the Viterbi algorithm also has a forward re- 
cursion component, 



Pj<f(ot | ej 



t = 



€3t = VV 7 V t * i t (29) 

\ip(pt I e^-J max; ent-i)±ij t = 1, . . . , L 



1 t = 

arg maxj e^^Tij t = 1, . . . , L 



as well as a reverse reconstruction component to compute the 
most likely state sequence S, 

gma^ t = L 
>» t+l( t+i) t=(L-l),...,0 



C. Sampling from the posterior of the BHMM 

Sampling from the posterior of the BHMM (Eq. 8) proceeds 
by rounds of Gibbs sampling, where each round consists of 
an update of the augmented model parameters {T, E, S} by 
sampling 



S' | T,E,0 
T" | S' 
E' I S ,0 



P(S' | T,E,0) 
P(T' | S') 
P(E' | S',0) 



where the conditional probabilities are given by Eq. 10. 



1. Updating the hidden state sequences 

We use a modified form of the Viterbi process to generate 
an independent sample of the hidden state history S given the 
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transition probabilities T, state observable distribution param- 
eters E, and observed data O. Like the Viterbi scheme, a for- 
ward recursion is applied to each observation trace o, but in- 
stead of computing the most likely state history on the reverse 
pass, a new hidden state history S is drawn from the distri- 
bution P(S | O, T, E). The forward recursion uses the same 
forward algorithm as used in Baum- Welch [34], 

fpj(p(oo\ej) t = 

yP\Pt I ej) 2^. =1 ot( t -i)iTij t = 1, . . . , L 

In the reverse recursion, we now sample a state sequence by 
sampling each hidden state from the conditional distribution 
s t ~ P(s t | st+ij ■ • • ? s l) starting from t — L and proceeding 
down to t = 0, where the conditional distribution is given by, 



P(s t = i | st+i, . . . , s L ) 

OLti/ Z^ j=1 Oitj 



(32) 



t L 



OLtiTi St+1 /J2j=i a tjT jst+1 t = (L- 1),...,0 



It is straightforward to show the result of these sampling steps 
reconstitutes the probability distribution P(S|T, E, O) (see 
Supplementary Material: Proof of state history sampling scheme). 



2. Updating the transition probabilities 

If no detailed balance constraint is used and the prior P(T) 
is Dirichlet in each row of the transition matrix T, it is possi- 
ble to generate an independent sample of the transition matrix 
from the conditional distribution P(T ; | S') by sampling each 
row of the transition matrix from the conjugate Dirichlet pos- 
terior using the transition counts from the sampled state se- 
quence S' [44]. However, because physical systems in the ab- 
sence of energy input through an external driving force should 
satisfy detailed balance, we make use of this constraint in up- 
dating our transition probabilities, since this has been demon- 
strated to substantially reduce parameter uncertainty in the 
data-limited regime [44]. 

The transition matrix is updated using the reversible transi- 
tion matrix sampling scheme of Noe [44, 52]. Here, an adjusted 
count matrix C = (c^ ) is computed using the updated hidden 
state sequence S', 



(33) 



where the Kronecker Sij = 1 if i = j and zero otherwise, and 
B = {bij) is a matrix of prior pseudocounts, which we take 
to be zero following the work of Noe et al. [13]. Using the 
adjusted count matrix C, a Metropolis-Hastings Monte Carlo 
procedure [53] is used to update the matrix and produce a new 
sample from P(T' | S'). Two move types are attempted, se- 
lected with equal probability, and 1000 moves are attempted to 
generate a new sample T' that is approximately uncorrelated 
from the previous T. Prior to starting the Monte Carlo proce- 
dure, the vector of equilibrium probabilities for all states 7r is 
computed according to 



T T 7T: 



(34) 



The first move type is a reversible element shift. A pair of 
states (i, j), i ^ j, are selected with uniform probability, and a 



random number A is selected uniformly over the interval, 

A e [max(-T ii ,-^T ii ),T ii ]. 



The changed elements in the proposed transition matrix T ; are 
then given by: 



T- — T — A • T '■■ — T 



7ti = T tt + A ; TjA = Tjj + — A. 



' ± 33 ~ ± 33 



This move is accepted with probability 



P a cce Pt (T'|T) = min I 1, J ^ + K 05) 



T'f \ c ii / ^T 1, \ c ij / \ c jj / '"P 1 \ c ji " 



Tii J V Tij J \ Tjj 



This move will leave the vector of stationary probabilities n 
unchanged. 

The second move type is a row shift. A row i of T is selected 
with uniform probability, and a random number 77 chosen uni- 
formly over the interval 



0, 



and used to update row i of T according to 



r\{Tu 



j = l,...,M, j^i 
1) + 1 j = i 



(36) 



This move is accepted with probability 

Pacce P t(T , |T) = min/l,^- 2 ^^" 6 * 



7/(1 



(37) 



The row shift operation will change the stationary distribution 
of 7T 7 , but it may be efficiently updated with 



rjTTj 



7Ti + ?7(1 - TYi 



TYi + 77(1 - TTi 



Since this update scheme is incremental, it will accumulate nu- 
merical errors over time that cause the updated 7r to drift away 
from the stationary distribution of the current transition ma- 
trix. To avoid this, n is recomputed from the current sample 
of the transition matrix in regular intervals (here, every 100 
sampling steps). 



3. Updating the observable distribution parameters 

Following the update of the transition matrix T, the ob- 
servable distribution parameters E are updated by sampling 
E from the conditional probability P(E ; | S x , O). The condi- 
tional probability for the observable distribution parameters 
for state m, denoted e m , is given in terms of the output model 
cp(o I e) by Bayes' theorem, 



P(E I 0,S) 



P(E) 



(38) 
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An important choice must be made with regards to the 
prior, P(E). If the prior is chosen to be composed of inde- 
pendent priors for each state, as in 



P(E) = FJ P(e„ 



(39) 



then the full BHMM posterior (Eq. 8) will be invariant under 
any permutation of the states. This behavior might be undesir- 
able, as the states may switch labels during the posterior sam- 
pling procedure; this will require any analysis of the models 
sampled from the posterior to account for the possible permu- 
tation symmetry in the states. On the other hand, breaking this 
symmetry (e.g., by enforcing an ordering on the state mean ob- 
servables) can artificially restrict the confidence intervals of the 
states, which might additionally complicate data analysis. 

Here, we make the choice that the prior be separable 
(Eq. 39), which has the benefit of allowing the conditional 
probability for E (Eq. 38) to be decomposed into a separate 
posterior for each state. For each state m, collect all the obser- 
vations o t whose updated hidden state labels s t ' — m into a 
single dataset o = {o n }n=i/ where N m is the total number of 
times state m is visited, for the purposes of this update pro- 
cedure. Then, the observable parameters e for this state are 
given by 



P(e | o) = P(o | e)P(e) 



P(e). (40) 



In the application presented here, we use a Gaussian output 
model (Eq. 11) for the state observable distributions P(o | e), 
where e = {/i, a 2 }, with \i the state mean observable and a 2 
the variance (which will include both the distribution of the 
observable characterizing the state and any broadening from 
measurement noise). Other models (including multidimen- 
sional or multimodal observation models) are possible, and 
require replacing only the observation model cp(o | e) and cor- 
responding prior P(e). 

We use the (improper) Jeffreys prior [54] which has the 
information-theoretic interpretation as the prior that maxi- 
mizes the information content of the data [55], (suppressing 
the state index subscript m), 



P(e) oc a" 



which produces the posterior 



P(e | o) oc a 



-(iV+l) 



exp 



1 



2a 2 



(41) 



(42) 



where we remind the reader that here and in the remainder of 
this section, the symbols e, o, a, \i, and N refer to e m , o m , cr m , 
/i m , and Nm, respectively. 

Updating {/x, a 2 } also proceeds by a Gibbs sampling 
scheme, alternately updating \i and a, as earlier described in 
Ref. [52], 



fi ~ P(/i | a ,o) 
a 2 ~P(a 2 |/i,o) 



(43) 



The conditional distribution of the mean \i is then given by 



P(/i | a 2 , o) oc exp 



2(a 2 /N) 



(44) 



where (i is the sample mean for o, the samples in state m, 



1 N 



(45) 



This allows us to update \i according to 

M '~Af(/i,<7 2 /iV) (46) 
The conditional distribution of the variance a 2 is given by 



P(a 2 | /x, o) oc a 



-(iv+i) 



exp 



Na 



(47) 



where the quantity a 2 , which is not in general identical to the 
sample variance, is given by 



n=l 

A convenient way to update a 2 | /i, o is to sample a random 
variate y from the chi-square distribution with N — 1 degrees 
of freedom, 



2/~ X 2 (iV-l) 



and then update a 2 as 



Na 2 /y. 



(49) 



(50) 



Note that \i and a 2 can be updated in either order, but the up- 
dated values of \i or a 2 must be used in sampling the not-yet- 
updated a 2 or \i, and vice-versa. 

Other output probabilities, such as mixtures of normal dis- 
tributions or other distributions, can be substituted by simply 
changing P(E | O, S) and the scheme by which E is updated. 



IV. VALIDATION USING SYNTHETIC DATA 

To verify that our BHMM posterior sampling scheme re- 
flects the true uncertainty in the model parameters, we tested 
the scheme on synthetic data generated from a model with 
known parameters 0*. Given observed data O generated 
from P(0 | 0*), sampling from the posterior P(0 | O) us- 
ing the scheme described in Sampling from the posterior of the 
BHMM will provide us with confidence intervals [0i ow , #hi g h] 
for a specified confidence interval size a £ [0, 1]. If these com- 
puted confidence intervals are accurate, we should find that 
the true model parameter 0* lies in the computed confidence 
interval [0^, 0^ h ] with probability a. This can be tested by 
generating synthetic observed data O from P(0 | 0*) and 

verifying that we find 0* £ [0^2, ^giil ^ n a fraction a of these 
synthetic experiments. 

As an example synthetic model, consider the three-state sys- 
tem intended to mimic a protein with (1) a highly-compliance, 
low-force unfolded state, (2) a moderately compliant low- 
population intermediate at intermediate force, and (3) a low- 
compliance, high-force folded state. Here, the term "compli- 
ance" refers to the width of the force or extension distribution 
characterizing the state. Parameters of the model are given in 
Table I, and the observation interval was taken to be r — 1 
ms. An example realization of a model trajectory, along with 
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FIG. 2. Synthetic force trajectory and inferred state assignments in MLHMM. Observed samples are colored by their hidden 
state assignments. Dark horizontal lines terminating in triangles to the right denote state means, while lightly colored bands 
indicate one standard deviation on either side of the state mean. The gray histogram on the right side shows the total observed 
probability of samples, while the colored peaks show the weighted Gaussian output contribution from each state, and the black 
outline the weighted sum of the Gaussian output contributions from the HMM states. 



TABLE I. Estimated mean model parameters and confidence intervals for synthetic timeseries data 

Estimated Model Parameters 



Property 




True Value 


1 000 observations 


10 000 observations 


100 000 observations 


stationary probability 


7Ti 
7T 2 
7T3 


0.308 
0.113 
0.579 


n QQQ U.48U 
U.ZZO 0.074 

0.093 8:oll 

o.679g:!I§ 


0.318 o!244 

0.124 8ii 
0.5588:Ii 


0.324 8:§§§ 

U.ll^ 0.104 

0.564 8:11? 


transition probability 


Tn 
Ti2 
Ti3 

T21 

T22 


0.980 
0.019 
0.001 
0.053 
0.900 


0.970 g 
0.023 o!oo9 

o.oo7 8:8^f 

0.054 gt 

0.868 o>90 


0.9728:8$ 
0.0268:81? 
0.0028:88? 
0.0678:811 
0.8908:1% 


0.9798:^4 
0.020 8:8?s 
o.ooi8:88i 
0.057 8:81^ 
o.8978:!8l 




T23 
Tsi 
T32 
T33 


0.050 
0.001 
0.009 
0.990 


0.078 8:oll 

0.0028:888 
o.oio8:8Sl 
o.988 8:l?l 


0.0438:833 
0.0018:888 
0.0108:80? 
0.9908:8!? 


0.046 8:818 
0.0018:880 
0.0098:808 
0.9908:811 



state mean force (pN) 


Mi 


3.000 


2.947 £si2 


2.998 18es 


3.001 2^990 


M2 


4.700 


4.666 till 


4.699 1:||| 


4.7021™ 




Ms 


5.600 


5.597 5;583 


5.602 5^96 


5.602 1;|88 


state std dev force (pN) 


<Tl 


1.000 


1.037 S:£g? 


0.992 ;8 6 ? 


0.999 :88? 


(J2 


0.300 


0.254 8:1?? 


n O o 7 0.300 

U.zor 0.275 


0.301 8:i8i 




&3 


0.200 


0.200 8:5go 


0.203 8:?8o 


0.201 8:188 



the MLHMM state assignment, is shown in Figure 2. We gen- 
erated a trajectory of 100 000 observations, and characterized 
the BHMM mean parameter estimate and 95% confidence in- 
tervals for a subset of this trajectory of varying lengths. The 
results, shown in Table I, show that the confidence intervals 
contract as trajectory length increases, as expected, and the 
BHMM-computed 95% confidence intervals contain the true 
model parameters with the expected statistics. In contrast, a 
model created from simply segmenting the observed forces 
into disjoint region and assigning state membership based on 
the force value alone estimates model parameters with signifi- 
cant bias even for 1 000 000 observations (see Supporting Infor- 
mation). 

As a more rigorous test, we sampled 50 random models 
from the prior P(0) with two to six states, generated a 10 
000 observation synthetic trajectory for each, and accumulated 
statistics on the observed fraction of time the true model pa- 
rameters were within the BHMM confidence intervals for var- 
ious values of the confidence interval width a. The results of 
this test are depicted in Supplementary Figure 1. We expect that 
the plot traces the diagonal if the observed and expected confi- 



dence intervals are identical; an overestimate of the confidence 
interval will be above the diagonal, and an underestimate will 
fall below it. Because only a finite number of independent 
replicates of the experiment are conducted, there is some as- 
sociated uncertainty with the observed confidence intervals. 
The results show that the observed confidence intervals line 
up with the expected confidence intervals to within statistical 
error, suggesting the BHMM confidence intervals neither un- 
derestimate nor overestimate the actual uncertainty in model 
parameters. 



V. RNA HAIRPIN KINETICS IN A PASSIVE OPTICAL 
TRAP 

We illustrate the BHMM approach applied to real force 
spectroscopy data by characterizing the average forces and 
transition rates among kinetically distinct states of the p5ab 
RNA hairpin in an optical trap under passive (equilibrium) 
conditions. 

The p5ab RNA hairpin from Tetrahymena thermophilia was 




FIG. 3. Experimental force trajectory of the p5ab hairpin and MLHMM state assignments. Observed samples are colored 
by their hidden state assignments. Dark horizontal lines terminating in triangles to the right denote state means, while lightly 
colored bands indicate one standard deviation on either side of the state mean. The gray histogram on the right side shows the 
total observed probability of samples, while the colored peaks show the weighted Gaussian output contribution from each state, 
and the black outline the weighted sum of the Gaussian output contributions from the HMM states. 



TABLE II. BHMM model estimates for p5ab hairpin data. 


Property 




Value 


Equilibrium probability 


7Tl 
7T2 
7T 3 


0.215 ii:i§g 
0.046 8:81? 
0.7408:??? 


Transition probability (At = 1 ms) 


Tn 

7l2 

Ti 3 
T 2 i 
T22 
T23 

t 3 i 

T32 
T33 


0,954 S 

0.033 CL029 

o.oi3g : 8i! 
o.i54 8:iI8 
0.650 8:127 
o.i96 8:?i8 
0.0048:88! 
0.0128:81? 
0.984 8:11! 


State force mean (pN) 


Mi 
M2 

M3 


12.549 ttm 

13.016 ii:8os 

13.849 Hit! 


State force std dev (pN) 


(J\ 
(72 
03 


o.2io8:&? 
0.201 8:?81 
o.2i3 8:ll? 


Transition rate (s _1 ) 


k\2 
ki3 
k 2 i 
k 2 3 
k 3 i 
k 3 2 


41.4 s§:§ 

9.1ft 3 
194.7 ??!:! 
243.7 111:8 

2.6 i:? 
15.0 HI 


State mean lifetime (ms) 


Tl 
T2 
T 3 


21.9 S:i 
2.9 1:? 
63.i 1:1 


provided by Jin-Der Wen, and prepared as previously de- 
scribed [56]. Within the population of RNA hairpin molecules 
in the examined sample, there were two chemically dis- 
tinct species present in the sample (i.e. as a result of post- 
transcriptional or other covalent modification during sample 
storage), exhibiting either apparent two-state (as reported pre- 
viously [56]) or three-state behavior (studied here). For the 
purposes of testing this method, we examined a fiber that ap- 
peared to consistently exhibit three-state behavior upon visual 
inspection of the force timeseries data. 

The instrument used in this experiment was a dual-beam 
counter-propagating optical trap with a spring constant of 0.1 



pN/nm. A piezoactuator controlled the position of the trap 
and allowed position resolution to within 0.5 nm [57]. Drift 
in the instrument was less than 1 nm/ minute resulting in a 
constant average force within 0.1 pN over the course of a typ- 
ical 60 s experiment. For these constant trap position experi- 
ments, higher frequency data was recorded at 50 kHz record- 
ing the voltage corresponding to the force on the tether di- 
rectly from the position-sensitive detectors. To ensure sequen- 
tial samples obeyed Markovian statistics, these data were sub- 
sampled down to 1 kHz for analysis by the BHMM framework 
after examination of autocorrelation functions for trap posi- 
tions where the hairpin appeared to remain in a single confor- 
mational state (see Supplementary Material: Choice of observation 
interval). 

A single observed force trajectory at a fixed trap position 
adequate to cause hopping among multiple states is shown in 
Figure 3. The most likely state trajectory from the MLHMM fit 
with three states is shown by coloring the observations most 
likely to be associated with each state, with bands of color indi- 
cating the mean and standard deviation about the mean force 
characterizing each state. 

Table II lists the BHMM posterior means and confidence 
intervals characterizing the three-state model extracted from 
this single 60 s observed force trace. Several things are no- 
table about the estimated model parameters. Surprisingly, 
while there is a clearly-resolved intermediate-force state (state 
2) through which most of the flux from the high- and low-force 
states passes (as seen from large K12 and K23), there are non- 
trivial rate constants connecting the high and low force states 
directly (if 13), indicating that while a sequential mechanism 
involving passing through the intermediate state is preferred, 
it may not be an obligatory step in hairpin formation under 
these conditions. While the state mean forces are clearly dis- 
tinct, the state standard deviations — which reflect the width 
of the observed force distribution characterizing each state, 
rather than the uncertainty in state means — possess overlap- 
ping confidence intervals. These standard deviations reflect 
not only contributions from both the distribution of extensions 
sampled by the hairpin in each conformational state, but also 
from fluctuations in the handles and beads, and other sources 
of mechanical and electrical noise in the measurement. As 
we would expect the unfolded hairpin to be more compliant 
(i.e. possess a wider distribution of forces) than the folded hair- 
pin, the inability to distinguish the standard deviations among 
states is suggestive that, for this experimental configuration 
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and observation time, the predominant contribution to the ob- 
served force distributions for each state may be in the form of 
handle or bead fluctuations or other sources of measurement 
noise. 

Finally, the lifetime of the intermediate-force state is signifi- 
cantly shorter than for the low- and high-force states by nearly 
an order of magnitude, and only a few times longer than the 
observation interval of 1 ms — despite this, the lifetime appears 
to be well-determined, as indicated by the narrow confidence 
intervals. 



VI. DISCUSSION 

We have described an approach to determining the first- 
order kinetic parameters and observable (force or extension) 
distributions characterizing conformational states in single- 
molecule force spectroscopy. By use of a Bayesian extension 
of hidden Markov models, we are able to characterize the ex- 
perimental uncertainty in these parameters due to instrument 
noise and finite-size datasets. The use of a detailed balance 
constraint additionally helps reduce the experimental uncer- 
tainty over standard hidden Markov models, as both transi- 
tions into and out of conformational states provide valuable 
information about state kinetics and populations in data-poor 
conditions [44, 45]. Additionally, the Gibbs sampling frame- 
work used to sample from the Bayesian posterior can be easily 
extended to incorporate additional nuisance parameters, such 
as stochastic models of instrument drift or laser power fluctu- 
ations. 

We have opted to make use of a reversible transition ma- 
trix to describe the statistical kinetic behavior between the ob- 
servation intervals At, but it is possible to use a reversible 



rate matrix instead by substituting a rate matrix sampling 
scheme [58] in the appropriate stage of the Gibbs sampling up- 
dates. 

While the experimenter must currently choose the number 
of conformational states by hand, a number of extensions of 
Bayesian hidden Markov models can be used to automati- 
cally determine the number of states best supported by the 
data, including reversible-jump schemes [59, 60] and varia- 
tional Bayes methods [61, 62]. 

We note that the experimenter in principle has access to 
the full posterior distribution of models given the observed 
data, so that instead of looking at the confidence of single pa- 
rameters, confidence intervals in more complex functions of 
parameters — such as the rates or lifetimes in Table II — can be 
computed, or joint posterior distributions of multiple param- 
eters examined. It is also possible to generate synthetic data 
from the current model, or family of models, to examine how 
the collection of additional data will further reduce uncertain- 
ties or allow discrimination among particular hypotheses. The 
field of Bayesian experimental design [63] holds numerous pos- 
sibilities for selecting how future experiments can maximize 
information gain, and whether the information gain from the 
collection of additional data will be of sufficient utility to jus- 
tify the expense. 
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